634 research outputs found

    Regularized approximate policy iteration using kernel for on-line reinforcement learning

    Get PDF
    By using Reinforcement Learning (RL), an autonomous agent interacting with the environment can learn how to take adequate actions for every situation in order to optimally achieve its own goal. RL provides a general methodology able to solve uncertain and complex decision problems which may be present in many real-world applications. RL problems are usually modeled as a Markov Decision Processes (MDPs) deeply studied in the literature. The main peculiarity of a RL algorithm is that the RL agent is assumed to learn the optimal policies from its experiences without knowing the parameters of the MDP. The key element in solving the MDP is learning a value function which gives the expectation of total reward an agent might expect at its current state taking a given action. This value function allows to obtain the optimal policy. In this thesis we study the capacity of SVR using kernel methods to adapt and solve complex RL problems in large or continuous state space. SVR can be studied using a geometrical interpretation in terms of optimal margin or can be seen as a regularization problem given in a Reproducing Kernel Hilbert Space (RKHS) SVR have good properties over the generalization ability and as they are based a on convex optimization problem, they do not suffer from sub-optimality. SVR are non-parametric showing the ability to automatically adapt to the complexity of the problem. Accordingly, applying SVR to approximate value functions sounds to be a good approach. SVR can be solved both in batch mode when the whole set of training sample are at disposal of the learning agents or incrementally which enables the addition or removal of training samples very effectively. Incremental SVR finds the appropriate KKT conditions for new or updated data by modifying their influences into the regression function maintaining consistence in the KKT conditions for the rest of data used for learning. In RL problems an incremental SVR should be able to approximate the action value function leading to the optimal policy. Accordingly, computation load should be lower, learning speed faster and generalization more effective than other existing method The overall contribution coming from of our work is to develop, formalize, implement and study a new RL technique for generalization in discrete and continuous state spaces with finite actions. Our method uses the Approximate Policy Iteration (API) framework with the BRM criterion which allows to represent the action value function using SVR. This approach for RL is the first one we know using SVR compatible to the agent interaction- with-the-environment framework of RL which shows his power by solving a large number of benchmark problems, including very difficult ones, like the bicycle driving and riding control problem. In addition, unlike most RL approaches to generalization, we develop a proof finding theoretical bounds for the convergence of the method to the optimal solution under given conditions.Mediante el uso de aprendizaje por refuerzo (RL), un agente autónomo interactuando con el medio ambiente puede aprender a tomar adecuada acciones para cada situación con el fin de lograr de manera óptima su propia meta. RL proporciona una metodología general capaz de resolver problemas de decisión complejos que pueden estar presentes en muchas aplicaciones del mundo real. Problemas RL usualmente se modelan como una Procesos de Decisión de Markov (MDP) estudiados profundamente en la literatura. La principal peculiaridad de un algoritmo de RL es que el agente es asumido para aprender las políticas óptimas de sus experiencias sin saber los parámetros de la MDP. El elemento clave en resolver el MDP está en el aprender una función de valor que da la expectativa de recompensa total que un agente puede esperar en su estado actual para tomar una acción determinada. Esta función de valor permite obtener la política óptima. En esta tesis se estudia la capacidad del SVR utilizando núcleo métodos para adaptarse y resolver problemas RL complejas en el espacio estado grande o continua. RVS puede ser estudiado mediante un interpretación geométrica en términos de margen óptimo o puede ser visto como un problema de regularización dado en un Reproducing Kernel Hilbert Space (RKHS). SVR tiene buenas propiedades sobre la capacidad de generalización y ya que se basan en una optimización convexa problema, ellos no sufren de sub-optimalidad. SVR son no paramétrico que muestra la capacidad de adaptarse automáticamente a la complejidad del problema. En consecuencia, la aplicación de RVS para aproximar funciones de valor suena para ser un buen enfoque. SVR puede resolver tanto en modo batch cuando todo el conjunto de muestra de entrenamiento están a disposición de los agentes de aprendizaje o incrementalmente que permite la adición o eliminación de muestras de entrenamiento muy eficaz. Incremental SVR encuentra las condiciones adecuadas para KKT nuevas o actualizadas de datos modificando sus influencias en la función de regresión mantener consistencia en las condiciones KKT para el resto de los datos utilizados para el aprendizaje. En los problemas de RL una RVS elemental será capaz de aproximar la función de valor de acción que conduce a la política óptima. En consecuencia, la carga de cálculo debería ser menor, la velocidad de aprendizaje más rápido y generalización más efectivo que el otro método existente La contribución general que viene de nuestro trabajo es desarrollar, formalizar, ejecutar y estudiar una nueva técnica de RL para la generalización en espacio de estados discretos y continuos con acciones finitas. Nuestro método utiliza el marco de la Approximate Policy Iteration (API) con el criterio de BRM que permite representar la función de valor de acción utilizando SVR. Este enfoque de RL es el primero que conocemos usando SVR compatible con el marco de RL con agentes interaccionado con el ambiente que muestra su poder mediante la resolución de un gran número de problemas de referencia, incluyendo los muy difíciles, como la conducción de bicicletas y problema de control de conducción. Además, a diferencia de la mayoría RL se acerca a la generalización, desarrollamos un hallazgo prueba límites teóricos para la convergencia del método a la solución óptima en condiciones dadas.Postprint (published version

    The scalar wave equation in a non-commutative spherically symmetric space-time

    Full text link
    Recent work in the literature has studied a version of non-commutative Schwarzschild black holes where the effects of non-commutativity are described by a mass function depending on both the radial variable r and a non-commutativity parameter theta. The present paper studies the asymptotic behaviour of solutions of the zero-rest-mass scalar wave equation in such a modified Schwarzschild space-time in a neighbourhood of spatial infinity. The analysis is eventually reduced to finding solutions of an inhomogeneous Euler--Poisson--Darboux equation, where the parameter theta affects explicitly the functional form of the source term. Interestingly, for finite values of theta, there is full qualitative agreement with general relativity: the conformal singularity at spacelike infinity reduces in a considerable way the differentiability class of scalar fields at future null infinity. In the physical space-time, this means that the scalar field has an asymptotic behaviour with a fall-off going on rather more slowly than in flat space-time.Comment: 19 pages, Revtex4, 7 figure

    Free Energy, Enthalpy and Entropy from Implicit Solvent End-Point Simulations

    Get PDF
    Free energy is the key quantity to describe the thermodynamics of biological systems. In this perspective we consider the calculation of free energy, enthalpy and entropy from end-point molecular dynamics simulations. Since the enthalpy may be calculated as the ensemble average over equilibrated simulation snapshots the difficulties related to free energy calculation are ultimately related to the calculation of the entropy of the system and in particular of the solvent entropy. In the last two decades implicit solvent models have been used to circumvent the problem and to take into account solvent entropy implicitly in the solvation terms. More recently outstanding advancement in both implicit solvent models and in entropy calculations are making the goal of free energy estimation from end-point simulations more feasible than ever before. We review briefly the basic theory and discuss the advancements in light of practical applications. \ua9 2018 Fogolari, Corazza and Esposito

    Quantum Effects in Friedmann-Robertson-Walker Cosmologies

    Get PDF
    Electrodynamics for self-interacting scalar fields in spatially flat Friedmann-Robertson-Walker space-times is studied. The corresponding one-loop field equation for the expectation value of the complex scalar field in the conformal vacuum is derived. For exponentially expanding universes, the equations for the Bogoliubov coefficients describing the coupling of the scalar field to gravity are solved numerically. They yield a non-local correction to the Coleman-Weinberg effective potential which does not modify the pattern of minima found in static de Sitter space. Such a correction contains a dissipative term which, accounting for the decay of the classical configuration in scalar field quanta, may be relevant for the reheating stage. The physical meaning of the non-local term in the semiclassical field equation is investigated by evaluating this contribution for various background field configurations.Comment: 17 pages, plain TeX + 5 uuencoded figure

    Regularized approximate policy iteration using kernel for on-line reinforcement learning

    Get PDF
    By using Reinforcement Learning (RL), an autonomous agent interacting with the environment can learn how to take adequate actions for every situation in order to optimally achieve its own goal. RL provides a general methodology able to solve uncertain and complex decision problems which may be present in many real-world applications. RL problems are usually modeled as a Markov Decision Processes (MDPs) deeply studied in the literature. The main peculiarity of a RL algorithm is that the RL agent is assumed to learn the optimal policies from its experiences without knowing the parameters of the MDP. The key element in solving the MDP is learning a value function which gives the expectation of total reward an agent might expect at its current state taking a given action. This value function allows to obtain the optimal policy. In this thesis we study the capacity of SVR using kernel methods to adapt and solve complex RL problems in large or continuous state space. SVR can be studied using a geometrical interpretation in terms of optimal margin or can be seen as a regularization problem given in a Reproducing Kernel Hilbert Space (RKHS) SVR have good properties over the generalization ability and as they are based a on convex optimization problem, they do not suffer from sub-optimality. SVR are non-parametric showing the ability to automatically adapt to the complexity of the problem. Accordingly, applying SVR to approximate value functions sounds to be a good approach. SVR can be solved both in batch mode when the whole set of training sample are at disposal of the learning agents or incrementally which enables the addition or removal of training samples very effectively. Incremental SVR finds the appropriate KKT conditions for new or updated data by modifying their influences into the regression function maintaining consistence in the KKT conditions for the rest of data used for learning. In RL problems an incremental SVR should be able to approximate the action value function leading to the optimal policy. Accordingly, computation load should be lower, learning speed faster and generalization more effective than other existing method The overall contribution coming from of our work is to develop, formalize, implement and study a new RL technique for generalization in discrete and continuous state spaces with finite actions. Our method uses the Approximate Policy Iteration (API) framework with the BRM criterion which allows to represent the action value function using SVR. This approach for RL is the first one we know using SVR compatible to the agent interaction- with-the-environment framework of RL which shows his power by solving a large number of benchmark problems, including very difficult ones, like the bicycle driving and riding control problem. In addition, unlike most RL approaches to generalization, we develop a proof finding theoretical bounds for the convergence of the method to the optimal solution under given conditions.Mediante el uso de aprendizaje por refuerzo (RL), un agente autónomo interactuando con el medio ambiente puede aprender a tomar adecuada acciones para cada situación con el fin de lograr de manera óptima su propia meta. RL proporciona una metodología general capaz de resolver problemas de decisión complejos que pueden estar presentes en muchas aplicaciones del mundo real. Problemas RL usualmente se modelan como una Procesos de Decisión de Markov (MDP) estudiados profundamente en la literatura. La principal peculiaridad de un algoritmo de RL es que el agente es asumido para aprender las políticas óptimas de sus experiencias sin saber los parámetros de la MDP. El elemento clave en resolver el MDP está en el aprender una función de valor que da la expectativa de recompensa total que un agente puede esperar en su estado actual para tomar una acción determinada. Esta función de valor permite obtener la política óptima. En esta tesis se estudia la capacidad del SVR utilizando núcleo métodos para adaptarse y resolver problemas RL complejas en el espacio estado grande o continua. RVS puede ser estudiado mediante un interpretación geométrica en términos de margen óptimo o puede ser visto como un problema de regularización dado en un Reproducing Kernel Hilbert Space (RKHS). SVR tiene buenas propiedades sobre la capacidad de generalización y ya que se basan en una optimización convexa problema, ellos no sufren de sub-optimalidad. SVR son no paramétrico que muestra la capacidad de adaptarse automáticamente a la complejidad del problema. En consecuencia, la aplicación de RVS para aproximar funciones de valor suena para ser un buen enfoque. SVR puede resolver tanto en modo batch cuando todo el conjunto de muestra de entrenamiento están a disposición de los agentes de aprendizaje o incrementalmente que permite la adición o eliminación de muestras de entrenamiento muy eficaz. Incremental SVR encuentra las condiciones adecuadas para KKT nuevas o actualizadas de datos modificando sus influencias en la función de regresión mantener consistencia en las condiciones KKT para el resto de los datos utilizados para el aprendizaje. En los problemas de RL una RVS elemental será capaz de aproximar la función de valor de acción que conduce a la política óptima. En consecuencia, la carga de cálculo debería ser menor, la velocidad de aprendizaje más rápido y generalización más efectivo que el otro método existente La contribución general que viene de nuestro trabajo es desarrollar, formalizar, ejecutar y estudiar una nueva técnica de RL para la generalización en espacio de estados discretos y continuos con acciones finitas. Nuestro método utiliza el marco de la Approximate Policy Iteration (API) con el criterio de BRM que permite representar la función de valor de acción utilizando SVR. Este enfoque de RL es el primero que conocemos usando SVR compatible con el marco de RL con agentes interaccionado con el ambiente que muestra su poder mediante la resolución de un gran número de problemas de referencia, incluyendo los muy difíciles, como la conducción de bicicletas y problema de control de conducción. Además, a diferencia de la mayoría RL se acerca a la generalización, desarrollamos un hallazgo prueba límites teóricos para la convergencia del método a la solución óptima en condiciones dadas

    Nonlocal field theory driven by a deformed product: Generalization of Kalb-Ramond duality

    Get PDF
    A modification of the standard product used in local field theory by means of an associative deformed product is proposed. We present a class of deformed products, one for every spin S = 0, 1/2, 1, that induces a nonlocal theory, displaying different form for different fields. This type of deformed product is naturally supersymmetric and it has an intriguing duality

    The Scalar wave equation in a non-commutative spherically symmetric space-time

    Get PDF
    Recent work in the literature has studied a version of non-commutative Schwarzschild black holes where the effects of non-commutativity are described by a mass function depending on both the radial variable r and a non-commutativity parameter θ. The present paper studies the asymptotic behavior of solutions of the zero-rest-mass scalar wave equation in such a modified Schwarzschild space-time in a neighborhood of spatial infinity. The analysis is eventually reduced to finding solutions of an inhomogeneous Euler–Poisson–Darboux equation, where the parameter θ affects explicitly the functional form of the source term. Interestingly, for finite values of θ, there is full qualitative agreement with general relativity: the conformal singularity at spacelike infinity reduces in a considerable way the differentiability class of scalar fields at future null infinity. In the physical space-time, this means that the scalar field has an asymptotic behavior with a fall-off going on rather more slowly than in flat space-time

    Biomolecular electrostatics with the linearized Poisson-Boltzmann equation

    Get PDF
    4openopenFOGOLARI F.; ZUCCATO P.; ESPOSITO G.; VIGLINO PFogolari, Federico; Zuccato, P.; Esposito, Gennaro; Viglino, Paol
    corecore